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ABSTRACT 

Recent reports on mathematics education reform have 
focused the attention of educational practitioners and policymakers 
on new goals for mathematics education and new descriptions of 
mathematical proficiency. QUASAR is a national project (Quantitative 
Understanding: Amplifying Student Achievement and Reasoning) designed 
to improve the mathematics instructional program for students 
attending middle schools, grades 6 through 8, in economically 
disadvantaged communities. QUASAR is a complex research study of 
educational change and improvement, in which a major effort will be 
made to study carefully different approaches to unblocking the path 
to mathematical power for poor students. Parallel goals for the study 
are: to ascertain conditions that appear conducive to mathematical 
success; to derive pedagogical principles for effective mathematics 
instruction for middle school students; to describe effective 
instructional programs that are adaptable to other schools; ard to 
devise new assessment tools to measure growth in higher order 
thinking, reasoning, and communication as they relate to school 
mathematics. Included in this report are: (1) an introduction that 
describes the purpose, the rationale, and the goals of this project; 
U) ' discussion of the educational considerations and mathematical 
cOi.LctJtualizations underlying the proposed methods of assessment for 
mathematical proficiency; (3) a discussion of construct-irrelevant 
test variance as a data-gathering consideration for the assessment of 
mathematical proficiency; (4) a discussion of the development of 
specifications for the assessment tasks in terms of focus and 
components; (5) a discussion of the specifications encompassing the 
scoring rubrics within the assessment procedures; and (6) a list of 
sample tasks and administrative information. (15 references) 
(Author/JJK) 
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Mathematics education refomi is currently a topic of great interest in the United 
States. Reports by the National Academy of Sciences (Nadonal Research Council, 1989), 
the American Association for the Advancement of Science (1989) and the Nadonal Council 
of Teachers of Mathematics (1989) have focused the attention of educational practitioners 
and policy makers on new goals for mathematics education and new descriptions of 
mathematical proficiency. Terms like reasoning, communication, problem solvbg, 
conceptual understanding, and mathematical power are used firequendy to describe an 
expanded view of mathematical proficiency that goes beyond memorization and mere 
competence in the basic skills of rational number computation. The reform discussion has 
thus led nanirally to considerations of how to assess smdents' attainments with respect to 
this new vision of mathematical proficiency and how to assess improvements that may 
result from curricular and instructional reforms that might be undertaken. This paper 
focuses on the efforts of one project to deal with the interface between assessment and 
instructional reform. 

QUASAR ((Quantitative Understanding: Amplifying Student Achievement and 
Reasoning) is a national project designed to improve the mathematics instructional program 
for students attending middle schools (grades 6-8) in economically disadvantaged 
communities (Silver, 1989). Cuirentiy operating at 6 school sites dispersed across the 
United States (Silver, Smith, Lane, Sahnon-Cox, & Stein, 1990), QUASAR is a practical 
school demonstration project whicn posits that students in these communities can and will 
learn a broader range of matiiematical content, acquire a deeper and more meaningful 
understanding of mathematical ideas, and demonstrate an ability to reason and solve 
appropriately complex problems. When implemented, such instructional programs wi" 
stand in stark contrast to those characterized by what might be called "assembly line" 
mathematics instruction - a program of repetitive drill and practice on basic computation 
which has characterized middle school mathematics education for many American students 
and which has relegated disproportionate numbers of poor students to the remedial track. 
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tliereby blocking their access to most socially acceptable paths to status and success. 
QUASAR is also a complex research study of educational change and improvement, in 
which a major effort will be made to study carefully different approaches to accomplishing 
this general goal; to ascertain conditions that appear \o be conducive to success: to derive 
instructional principles for effective mathematics ii\stiuctiok? for middle school students; to 
describe effective instructional programs in ways that will allow their adaptation to otiier 
schools, and to devise new assessment tools to ueasure growtii in high-level tiiinking, 
reasoning and communication as tiiey relate tc mathematics. 

Given the goals and aspirations of the QUASAR project, it is imperative tiiat 
appropriate measures be developed to monitor and evaluate program impact One important 
set of indicators are tiiose tiiat pertain to growtii in student knowledge and proficiency over 
time. Development of the assessments for tiie QUASAR project has utilized an approach 
advocated by the National Council of Teachers of Matheoaatics Curriculum and Evaluation 
Standards for School Mathematics (1989). That report argued for improving die alignment 
of testing with curriculum goals, advocated tiie use of multiple sources of assessment 
information, and suggested that more attention be given botii to appropriate metiiods of 
assessment and to tiie proper use of assessment information. With respect to die metiiods 
of assessment, the report asserted tiiat an autiientic assessment of matiiematical proticiency 
would need to address such areas as problem solving, communication, reasoning, and 
disposition, as well as concepts and procedures. 

The QUASAR project will employ a variety of measures in assessing student growth, 
including paper-and-pencil cognitive assessment tasks administered to individual students 
in a large group setting; tasks administered to students in small groups, and on which tiiey 
are expected to work collaboratively; individually administered performance assessments, 
which may involve die use of manipulative materials and computational tools; tasks 
designed to provide information on metacognitive processes used in problem solving; and 
non-cognitive assessments aimed at important attinides, beliefs, and dispositions. Teachers 
at tiie project sites are also asked lo supply information available from tiieir own classroom 
sources (e.g., tests, homework, projects) to supplement tiie store of information about both 
the program and individual students. 

In the development of assessments, tiie project has attempted to keep a balanced 
perspective regarding psychometric constraints and educational needs. This has been 
possible because the coordinator of assessment developmenv (S. Lane) is a psychometrician 
by training and tiie project director (£. Silver) is a mathematics educator. We believe tiiat 
tiiis balanced perspective is essential for significant progress to be made in establishing 
alternative assessments as possible replacements for or supplements to tiie current system 
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of standardized, muldple-choice testing that has become entrenched in the United States. 
This paper presents an overview of the design principles for the development of the paper- 
and-pencil mathematics assessment instrument that is administered to individual students in 
a large group setting. 

The QUASAR assessments are designed to provide programmatic rather than 
individual student information. In otiier words, we axe not attempting to provide valid, 
reliable indicators for the purpose of evaluating individual students; rather, we have 
designed a system that will collect data from individual students but will provide evaluative 
information only at die program level. Therefore, a relatively large number of assessment 
tasks (currendy about 36) is administered at each project site, but each student completes 
only a small number of the tasks (about 9) on each administration occasion. Because of 
our focus on program evaluation, use of tiiis approach allows us to avoid die difficulty of 
sampling only a small range of tasks. Over time, it is planned to release some assessment 
tasks and add new ones. The public release of tasks and scoring rubrics should allow for a 
clearer understanding of the nature of mathematical proficiencies being assessed and the 
judgment criteria tiiat are applied in the evaluation of responses. The addition of new tasks 
each year wUl allow the QUASAR assessment instrument to expand to include not only 
tasks that reflect important general instructional emphases and topics but also some tasks 
that have been tailored to reflect the unique features of instructional programs tiiat vary 
across sites; tiiese latter tasks could be developed in close cooperation with the teachers a^*' ! 
resource partners at each project site. 

Given the goals of the QUASAR project regarding insuiictional program emphases 
on breadth of content, tasks have been developed to assess students' knowledge across a 
wide range of content area.« ~ going well beyond whole numbers and arithmetic. Also, 
given die project's goals related tc high-level thinking and deep conceptual understanding, 
the assessment tasks focus on mathematical reasoning, problem solving, and modeling, 
and on students' understanding of die features that characterize mathematical concepts and 
their interrelationships. Due to space limitations, the description of QUASAR assessment 
in this paper will be quite brief in some places. Further details regarding die design 
principles and conceptual framework for the assessment can be found in Lane (1991). 

QUASAR'S A ssessment of Mathematical Proficiencv: Some Educational Considerations 

The parameters that characterize QUASAR'S vision of mathematical ability and 
mathematical power have been described to a large extent in the Curriculum and Evaluation 
Standards for School Mathematics (National Council of Teachers of Matiiematics, 1989), 
which suggest die importance of understanding concepts and procedures, becoming a 
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mathematical problem solver, learning to reason mathematically, making connections 
among mathematical topics and between mathematics and the world outside the 
mathematics classroom, and learning to communicate mathemadcal ideas. The vision is 
also consistent with that ot the Mathemadcal Scisnces Education Board (National Research 
Council, 1990) which argued tiiat mathematical power involved die development of the 
abilities to understand matiiematical concepts, principles and procedures, to discern 
mathematical relations, tc reason matiiematically, and to apply matiiematical concepts, 
principles, and procedures to solve a variety of nonroutine problems. 

In tills view, matiiematics is conceptualized as involving problems tiiat are complex, 
yield multiple solutions, require judgment and interpretation, require finding structure, and 
require finding a patii for a solution tiiat is not immediately visible. Furthermore, success 
in mathematical problem solving is viewed as being related to and at least partially 
dependent on students' beliefs about tiie nature of mathematics and problem solving, 
attitudes towards and interest in mathematics, and tiie socio^ltural context (Lester & 
KroU, 1990; Silver, 1985). Specifications for tiie QUASAR assessment tasks were based 
upon tiiese conceptualizations of mathe*'jatical proficiency. 

QUASAR'S Assessment of Mathematical Proficiencv: Some Measurement Considerations 

An assessment instrument is an imperfect measure of a construct because it eitiier 
underrepresents tiie construct domain (i.e., the assessment instrument is too narrow) or in 
addition to measuring tiie constzuct domain it also measures sometiiing that is iiielevant to 
the construct (i.e., irrelevant excess reliable variance), or some combination of the two 
(Messick, 1989). To ensure that tiie construct domain is fully represented, QUASAR'S 
assessment of mathematical proficiency is sensitive to many facets, including mathematical 
reasoning, mathematical communication, knowledge and use of strategies and 
representations, and knowledge and use of matiiematical concepts, principles, and 
procedures. Moreover, the assessment attends to the fact tiiat tiiese facets interact with 
various matiiematical content areas such as number sense, geometry, and statistics. 

Two kinds of constnict-irrelevant test variance arc proposed by Messick (1989): 
construct-irrelevant easiness and constnict-irrelevant difficulty. Construct irrelevant 
easiness refers to tiie potential of clues or flaws in task format which may allow some 
students to respond correctiy in ways tiiat are irrelevant to tiie construct domain being 
measured, and which may lead to scores that are invalidly high. Constnict-irrelevant 
difficulty refers to the possibility that tiie assessment instrument is, for irrelevant reasons, 
more difficult for some groups of students. In QUASAR'S assessments of smdents' 
abilities to think and reason mathematically, we were sensitive to several potential irrelevant 
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consu-ucts that could adversely affect some groups of students, such as differences in 
reading comprehension ability, writing ability, or familiarity with task contexts. Therefore, 
the degree of reading and writing required of the student by the task was considered in 
developing open-ended assessment tasks and scoring rubrics, as was Uie likely familiarity 
of the task contexts to students of differing cultural and ethnic backgrounds. Not only 
were these two sources of invalidity considered in the process of constructing the 
assessment tasks and corresponding scoring rubrics but they will also be considered when 
interpreting student performances. 

Another measurement issue relates to the reliance on a single measure of a complex 
construct. To triangulate observations of a complex construct, multiple measures are 
needed. To measure program outcomes and growth in the QUASAR project, the core 
assessment instrument incoiporates a number of task formats (e.g., requiring a student to 
justify a selected answer vs. showing the solution process used to arrive at an answer) and 
process constraints (e.g., producing a numerical answer vs. drawing a diagram). 
Moreover, as Baker (1990) has noted, any measurement procedure must be understood in 
the light of other available information and the intended uses of the scores. Therefore, 
information will also be obtained about classroom processes, students' class assignments 
and assessments, teachers' knowledge and beliefs about mathematics, and students' beliefs 
about and disposition towards mathematics. 

Specification of the Assessment Tasks 

The development of QUASAR'S assessment tasks and scoring rubrics involves a 
collaborative effort by a team consisting of matiiematics educators, mathematicians, 
cognitive psychologists, and psychometricians. Our approach is related to but somewhat 
different from other examples of alternative assessment frameworks (e.g., Nitko & Lane, 
1990; Pandey, 1990; Romberg, Zarinnia, & Collis, 1990). The assessment tasks are 
specified in terms of four components: cognitive processes, mathematical content, mode of 
representation, and task context With a particular focus on mathematical problem solving 
and mathematical reasoning, the cognitive processes that were specified for task 
development included the following: understanding and representing problems, discerning 
mathematical relations, organizing information, using and discovering strategies and 
heuristics, using and discovering procedures, formulating conjectures, evaluating the 
reasonableness of answers, generalizing results, and justifying answers or procedures. 
The content categories included tiie following: number and operations (involving decimals, 
fractions, ratios, and proportions); estimation (both computational and measurement); 
patterns (both numerical and geometric/spatial patterns); algebra (especially tasks related to 
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the transition from arithmetic to algebra); geonietiy and naeasurement; and data analysis 
(including probability and statistics). The types of representations used in task 
development and expected of students in developing the scoring rubrics include written, 
pictorial, graphic, tabular, and arithmetic representations. With respect to task context, an 
attempt was made to embed as many tasks as possible within an appropriate context if it 
could be done without requiring an excessive amount of reading on the part of the smdents. 

Specification of Scoring Rubrics 

A focused holistic scoring method is being used to score students' responses to each 
task. A generalized scoring rubric was designed to incorporate three interrelated 
components related to the task development specifications described above: mathematical 
conceptual and procedural knowledge, strategic knowledge, and communication. With 
respect to mathematical knowledge, attention is paid to the extent to which students 
demonstrate their knowledge of matiiematical concepts, principles and procedures, such as 
understanding relation '}nips among problem elements; using appropriate mathematical 
terminology or notadon; recognizing when a procedure is appropriate; executing 
procedures; verifying results of procedures; and generating or extending familiar 
procedures. In tiie area of strategic knowledge, students are expected to use models, 
diagrams, and symbols to represent and integrate concepts in addition to being systematic 
in their application of strategies. The area of communication relates to students' ability to 
communicate tiieir mathematical ideas in writing, symbolically, or visually; to use 
matiiematical vocabulary, notation, and structure to represent ideas; and to describe 
relationships and model situations. Some tasks require the justification of answers tiirough 
tiie use of appropriate modes of communication (e.g., wntten, pictorial, graphical, or 
algebraic metiiods) for expressing tiie integration of mathematical ideas, conjectures, and 
arguments; otiier tasks require the description of strategies or patterns. 

The scoring rubrics developed by tiie California Assessment Program (California 
State Department of Education, 1989) provided a basis for tiie development of QUASAR'S 
generalized rubric. In developing tiie generalized scoring rubric, criteria representing tiie 
three interrelated components were specified for each of five score levels (0-4). Based on 
tiie specified criteria at each score level, a specific rubric was developed for each task. The 
emphasis on each component for a specific rubric was dependent upon tiie demands of ti?.e 
task. In addition to scoring the smdent responses using the scoring rubric developed for 
each task, tiie student responses will be evaluated using otiier more analytic procedures. 
These latter analyses should provide more detailed inforaiation regarding tiie types of 
representations and strategies students use, the nature of errors or misconceptions in 
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Students' work, and the nature of the mathemadcal knowledge and cognidve processes 
underlying successful performance. 

Sample Tasks and Administration Infonmation 

For the 1990-91 school year, a set of thirty-six assessment tasks was developed for 
use with sixth-grade students. The thirty-six tasks were divided into four sets of nine 
different tasks, which were randomly distributed to students in each classroom. Students 
received a different set in each of the Fall and Spring adminlstradons. Two examples of 
assessment tasks similar to those used in the QUASAR project are provided in Figure 1. 

For the first task, it is expected that a student would draw a 9-by-9 square on the grid 
provided and shade the square in. Also it is expected that a student would describe the 
pattern by saying "It is a pattern of squares with odd sides - 1, 3, 5, 7, 9, 1 1, and so on;" 
or "In the pattern you add 2 rows and 2 columns to each square to get the next square;" or 
some other similar description. In the next task, we would expect that a smdent's response 
would show evidence of a clear reasoning process. For example, a student might answer 
"no" and provide an explanation, such as "Yvonne takes the bus eight times in the week, 
and this would cost $8.00. Since the bus pass costs $9.00, she should not buy the pass." 
It is possible, however, that a student might answer "yes" and provide a logical reason, 
such as "Yvonne should buy the bus pass because she rides the bus eight times for woik 
and this costs $8.00. If slie rides the bus on weekends (to go shopping, etc.), it would 
cost $2.00 or more, and that would be more than $9.00 altogether, so she can save money 
with the bus pass." As this example suggests, tasks presented in this open-ended format 
may allow for more than one possible correct answer. 

After student responses have been obtained, the papers are scored by teams of 
classroom teachers who are trained as raters. The raters use the scoring rubric for each task 
in order to assign a score between 0 and 4 to each student's response. In addition to these 
holistic judgements, student responses will be subjected to furtlier examination and analysis 
in order to probe for systematic enor patterns, cognitive process information, data 
regarding strategy usage, and other important insights related to the matiiematical 
knowledge and performance of the students. 

As noted earlier, QUASAR intends to use a wide range of assessment procedures. In 
addition to open-ended tasks similar to those sbowii in Figure 1, QUASAR will also utilize 
some performance assessments involving use of manipulative materials or computational 
tools, such as calculators. Perform -mce assessments have been developed and will be 
implemented on a pilot basis during the 1990-91 school year. Tasks assessing students 
working in small groups are also planned for the near future. 
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Figure 1 

Sample Assessment Tafsf gig 




A. Draw the 5th figure: 



B. Deflcribe the pattern. 

Task 2 - Mathematical Content: Numbei*s and Operations 
The table below shows the cost for differeat bus £Seu:«8. 



BUSY BUS COMPANY 
FABES 

One Way $ LOO 

Weekly Pass $9.00 



Yvonne is trying to dedde whether she should buy a weekly bus pass. 
On Monday. Wednesday and Friday she rides the bus to and from work- On 
Tuesday and Thursday she rides the bus to work» but gets a rida home with 
her friends. 



Should Yvonne buy a weekly bus pass? 
Explain your answer. 
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